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• ' Let L be an infinite regular language on a totally ordered alphabet 

r\ , (E, <). Feeding a finite deterministic automaton (with output) with the 

words of L enumerated lexicographically with respect to < leads to an 
infinite sequence over the output alphabet of the automaton. This process 
generalizes the concept of fc-automatic sequence for abstract numeration 
systems on a regular language (instead of systems in base k). Here, I study 
the first properties of these sequences and their relations with numeration 
f ^ , systems. 

Q\ '. 1 Introduction 

o\ 

C/3 ! In pi , P. Lecomte and I have defined a numeration system as being a triple S — 

O ■ (L, E, <) where L is an infinite regular language over a totally ordered alphabet 

(E, <). The lexicographic ordering of L gives a one-to-one correspondence rg 
between the set of the natural numbers IN and the language L. 

For a given subset X of IN, a question arise naturally. Is it possible to find a 
numeration system S such that is(X) is recognizable by finite automata ? (In 
this case, X is said to be S -recognizable.) For example, the set {n 2 : n £ IN} 
is S'-recognizable for some S and the arithmetic progressions p + qJN are S- 
recognizable for any S. An interesting question is thus the following: is there a 
system S such that the set of primes is S'-recognizable ? 

To answer this question I show that a subset of IN is S'-recognizable if and 
only if its characteristic sequence can be generated by an 'automatic' method. 
The term automatic refers, as we shall see further, to a generalization of the 
fc-automatic sequences for numeration systems on a regular language. 

The fc-automatic sequences are well-known and have been extensively studied 
since the 70 's ^ ||, ^, Q. The construction of this kind of sequences is based 
on the representation of the integers in the base k. For a given integer n, one 
represents this number in base k using the greedy algorithm and obtains a word 
[n]k over the alphabet {0,...,k — 1}. Next one gives [n]k to a deterministic 
finite automaton with output and obtains the n th term of a sequence which is 
said to be a fc-automatic sequence. 



These sequences have been already generalized in different ways [0. In 
particular, a method used by J. Shallit to generalize the fc-automatic sequences 
is to consider some kind of linear numeration system instead of the standard 
numeration system with integer base k [H. Two properties of the systems 
encountered in pj are precisely that the set of all the representations is regular 
and that the lexicographic ordering is respected. 

Here, instead of giving [n]k to a deterministic finite automaton with output, 
we feed it with rg(n) to obtain an output which is the n th term of an S -automatic 
sequence for a numeration system S. Having thus introduced the concept of S- 
automatic sequences, we can follow two paths. Learn their intrinsic properties 
but also use them as a tool to check if a subset of IN is S-recognizable. 

Our article has the following articulation. In the first section, we recall some 
definitions and we introduce a teaching example which could be very instructive 
for the reader not familiar with automatic sequences. In the second section, we 
adapt the classical results concerning the fiber and the kernel of an automatic 
sequence. 

Initially, A. Cobham showed the equivalence between the /c-automatic se- 
quences and the sequences obtained by iterating a uniform morphism (also called 
uniform tag system J7]]). In the third section, we show that an S-automatic 
sequence is always generated by a substitution (i.e., an iterated non- uniform 
morphism followed by one application of another morphism) . From this, we de- 
duce that the number of distinct factors of length I in an S'-automatic sequence 
is in 0{l 2 ). We also show how to construct S'-automatic sequences with at least 
the same complexity that infinite words obtained by iterated morphisms. 

In the last section, we will be able to show that for any numeration system 
S, the set of primes is never S- recognizable. We use the fact that to be S- 
recognizable, the characteristic sequence of the set must be generated by a 
substitution. Hence we use some results of C. Mauduit about the density of the 
infinite words obtained by substitution O, Oj. 

2 Basic definitions and notations 

In this paper, capital greek letters represent finite alphabet. We denote by E* 
the set of the words over E (e is the empty word) and by E" the set of the 
infinite words over E. If if is a set then #K denotes the cardinality of K and 
if w is a string then \w\ denotes the length of w. For 1 < i < \w\, Wi is the i th 
letter of w. The same notation holds for infinite words, in this case i £ IsT\ {0}. 
First, recall some definitions about the numeration systems we are dealing 
with. For more about these systems see 11. 

Definition 1 A numeration system S is a triple (L, E, <) where L is an infinite 
regular language over the totally ordered alphabet (E, <). 

For each n £ IN, rg(n) denotes the (n + l) th word of L with respect to the 
lexicographic ordering and is called the S -representation of n. 

Remark that the map rg : IN — > L is an increasing bijection. For w 6 L, we 
set vals(w) = Tg (u>). We call vals(w) the numerical value of w. 

Examples of such systems are the numeration systems defined by a recur- 
rence relation whose characteristic polynomial is the minimum polynomial of a 
Pisot number . (Indeed, with this hypothesis, the set of representations of the 



integers is a regular language.) The standard numeration systems with integer 
base and also the Fibonacci system belong to this class. 

Definition 2 Let S be a numeration system. A subset X of IN is S ' -recognizable 
if rg(X) is recognizable by finite automata. 

Let us introduce the concept of S-automatic sequence which naturally gen- 
eralizes the /c-automatic sequences based on the representation of the integers 
in base k. For more about /c-automatic sequences see for instance [|2[ [j. 

Definition 3 A deterministic finite automaton with output (DFAO) M is a 6- 
uple (K, s, E, 5, A, r) where K is the finite set of the states, s is the start state, 
£ is the input alphabet, 5 : K x £ — > K is the transition function, A is the 
output alphabet and r : K — > A is the output function. 

Definition 4 Let S = (L, £, <) be a numeration system. A sequence u £ A" 
is S-automatic if there exists a DFAO M = (K, s, E, <5, A, r) such that for all 
neN, 

Un+i = r(5(s,rs(n))). 

If the context is clear, we write r(w) in place of t(5(s, w)). 

Remark 1 A subset X C IN is S'-recognizable if and only if its characteristic 
sequence \x £ {0, 1}" is S'-automatic. 

In the following we will often encounter two more 'classical' ways of obtaining 
infinite sequences. 

Definition 5 Let ip : E — > E* be a morphism of monoid such that for some 
it £ E, <p(er) £ <t£*. The word it^, = </? w (c) is a fixed point of tp and we say that 
u,p is generated by an iterated morphism. 

A morphism is uniform if |(/?(<7i)| = . . . = |<^(cr„)|, E = {<7i, . . . ,a n }. 

Definition 6 A substitution T is a triple (ip, h, c) such that ip : E — » E* and 
/i : E — > A* are morphisms of monoids. Moreover c £ E, 93(c) £ cE* and for 
any a £ E, ft(cr) =eor fr(c) £ A (h is said to be a weafc coding). We said that 
the word mt = h((p u (c)) over A is generated by the substitution T. 

If h(a) — e for some a then ft is said to be erasing otherwise h is said to be 
non-erasing. 

2.1 A teaching example 

We consider the numeration system S — (a*b* ,{a,b},a < b), the alphabets 
E = {a, b}, A = {0, 1, 2, 3} and the following DFAO 





As usual the start state is indicated by an unlabeled arrow. The first words of 
a*b* are 

e, a, 6, aa, ab, bb, aaa, aab, abb, bbb, . . . 

and thus feeding the automaton with these words we obtain the first terms of 
the sequence u G A", 

u = 01023031200231010123023031203120231002310123010123 .... 



Remark 2 The sequence u is not ultimately periodic. One can observe that 
the distance between two occurrences of the block '00' is not bounded. Indeed, 

t(w) =Q^3r,sElN :w = a 4r b s (1) 

thus a block '00' comes from two consecutive words b 4r ~ l and a 4r , r > 1 and 
the number of words of length n in a*b* is n + 1, n G IN. 



Remark 3 The sequence u is not generated by an iterated morphism ip. First 
observe that 

a 4r+1 b 3s , a 4r+2 b 3s+1 , a 4r+3 b 3s + 2 
& Br, s G IN : w = { a 4r+1 b 3s + 2 , a 4r+2 b 3s , a 4r+3 b 3s+1 

a 4r+l 6 3 S +l ; a 4r+2 6 3 S +2 ; a ir+^s 

Suppose that there exists a morphism tp such that u — lim„^ +00 p n (0). 

1) If <p(0) G 0102A* then the block '0102' must appear at least twice in u 
since '0' appears twice in u. If the first '0' of the block is obtained from a word 
a 4r b s with r > 1 then the second '0' is obtained from a 4r ~ 2 b s+2 which leads to 
a contradiction in view of (Q). If the first '0' is obtained from b s with s > 1 
then the second '0' come from a 3 b and we have s — At. The '2' is obtained from 
a 4t-i^2 w iji c h a i so i ea( j s to a contradiction. 

2) If ip(0) = 01 then in view of the first terms of u, <p(l) G 023031200231A*. 
We show that '023031200' appears only once in u. Suppose that we can find 
another block of this kind. Thus the last two '0' come from words b 4r ~ 1 and a 4r 
with r > 2. Since we consider all the words of a*b* lexicographically ordered, 
the first '0' of the block come from a 7 b 4r ~ s which is in contradiction with ([[]). 

3) If ip(0) = 010 then 95(1) G 23031200231A* and tp(0l0) G 01023031200A*. 
The block '010' appears at least twice in u but we know that '023031200' appears 
only once. 

We shall see further that u is generated by a substitution. 

3 First results about /S-automatic sequences 

Some classical results about fc-automatic sequences can be easily restated pi pf . 

Definition 7 Let a G A and S = (L, E, <) , the S-fiber Ts{u : a) of a sequence 
u G A" is defined as follows 

Ts{u,a) = {r.s{n) : U n = a}. 



Theorem 8 Let u be an infinite sequence over A and S = [L, S, <). The 
sequence u is S-automatic if and only if for all a G A, J-s(u,a) is a regular 
subset of L. 

Proof. If u is S-automatic then we have a DFAO M — (K, s, S, S, A, r) which 
is used to generate u. Let L(M') be the language recognized by the DFA 
M' = (K, s, S, S, F) where the set of final states F only contains the states k 
such that r(k) = a. Therefore !Fs{u,a) is regular since it is the intersection of 
the two regular sets L(M') and L. 

The condition is sufficient. Let A = {ai, . . . , a„}. Remark that if i ^ j, 
J-s(u, ai) CiTsiu, dj) = and L = \Ji =1 Ts(u, ai). For alii = 1, . . . , n, Ts{u, ai) 
is accepted by a DFA Mi — (Ki, Sj, S, Si, Fi). From these automata we construct 
a DFAO M = (K, s,Y,,S, A,t) to generate u using the numeration system S. 
The set K is K\ x . . . x K n , the initial state is (si, . . . , s n ). For all states 
(<7i,-- -,q n ) e K and for all cr G S, <5((gi,. .. ,<?„), er) = ((Ji (gi , a) , . .. ,S n (q n ,a)). 
If there is a unique i such that qi G Fi then r((gi, . . . , q n )) = ai otherwise the 
state cannot be reached by a word of L and the output is not important. The 
sequence u is obtained from S and the DFAO M thus u is S'-automatic. □ 

The notion of fc-kernel of a fc-automatic sequence can be transposed as follows. 

Definition 9 Let S = (L, S, <) and u be an infinite sequence. For each w G £*, 
we set /Cu, = {w G L | 3z G S* : v = wz}. One can enumerate K, w lexicographi- 
cally with respect to <, K. w — {wzq < wz\ < . . .}. Thus for each w G £*, one 
can construct the subsequence n i— > u V ai s (M)z„) (remark that the subsequence 
can be finite or even empty). 

Theorem 10 Let S = (i,S, <). A sequence u G A w is S-automatic if and 
only if {n t— » U va i g ( U) ^ tl ) : W G S*} is finite. 

Proof. If u is S'-automatic, we have a DFAO M = (if, s, S, <5, A, r) used to 
generate u and we define the equivalence relation ~i over E* by x ~i y if and 
only if (5(s, x) = 5{s, y). In the same way, the minimal automaton of L provides 
an equivalence relation ^ 2 . The two relations have a finite index thus the 
relation ~i j 2 given by x ~i,2 y if and only if x ~i y and x ^iy has also a finite 
index. Remark that each class of ~i,2 gives one of the sequences n •— > w va i g ( tuZn ). 
Indeed, x ~2 y implies that {z G S* : xz G £} = {z G S* : yz G i} thus 
^Ck = {^^o < xzi < . . .} and IC y — {yzo < yz\ < . . .} with the same zq, z\, . . .. 

The condition is sufficient. We show how to construct a DFAO. The states 
are the subsequences q w = (n i— ► u va i s t WZn \). The initial state is q 6 (i.e., the 
subsequence obtained from the empty word). The transition function S is given 
by S(q w , a) = q WIJ and the output function r is given by r(q w ) — u va \ s t w ). □ 

4 Complexity of S'-automatic sequences 

The complexity function p u of an infinite sequence u maps n G IN to the number 
p u (n) of distinct factors of length n which occur at least once in u. In this 
section, we will show that the complexity of an S-automatic sequence is in 
0(n 2 ) as a consequence that every S-automatic sequence is generated by a 
substitution. 



Recall that an infinite word w generated by iterated morphism has a com- 
plexity such that 

ci/(n) <p w (n) < c 2 f(n) 

where f(n) is one of the following functions 1, n, nloglogn, nlogn or n 2 |13| . 
For a survey on the complexity function, see for example |y . 

The next remark shows that an S-automatic sequence can reach at least the 
same complexity as a word generated by morphism. 

Remark 4 For every infinite word w generated by an iterated morphism if over 
an alphabet A we can construct an S'-automatic sequence u such that Vn G IN, 

Pw{n) <p u (n)- 

We show how to proceed on the following example, 

A = {0,1}, y>:| 1mU _ 

It is well-known that w = y>"{0) is such that p w is of complexity O(nloglogn) 
p3[ . To the morphism ip, we associate a finite automaton M (if the morphism is 
not uniform then M is not deterministic). The set of states is A, all the states 
are final and the transition function <5 is obtained by reading the productions of 
<p from left to right. For this purpose, we introduce a new ordered alphabet £ 
such that #£ = sup a , eA |^>(ar)|. Here, gives the initial state (for we consider 
the word ip u (0)) and 1 the other state. Thus with £ = {a<6<c< d}, we 
have (5(0, a) = [<p(0)]i = 0, (5(0, b) = [p(0)] 2 = 1, ... and M is then 



b,d 





As is customary, the final states are denoted by double circles. The language 
accepted by M is L = {a, c}*{6, d}{a, b}* U {a, c}*. The numeration system S 
is thus (L, S, a < b < c < d). This kind of construction can also be found in 
P0| . Now from Af we simply construct a DFAO M' 

a,b,c,d 
b,d 






The way we find the output can be easily understood. The third state can have 
any output for this state is never reached with a word belonging to L. One 
remarks that the S'-automatic sequence obtained with M' and S is 

u = ^(ov 2 (ov 3 (o)... 

and thus every factor of w = y"(0) belongs to u. 

We now show that every S-automatic sequence is generated by a substitu- 
tion. 



Lemma 11 Let S = {<7i < . . . < <r n }, M = (if, s, S, J, F) 6e a DFA and 

a $ if. The morphism ipM : K U {a} — » (if U {a})* defined by 

a i— > as 

k i— ► <5(fc, <7i) . . . <5(fc, <T n ), k G K 

produces the sequence u v of the states reached by the words of E* i.e., \/i £ 
IN \ {0}, Ui + i = <5(s,Wi) where Wi is the i th element of (£*, <). 

Proof. One can check easily by constructing y(a), <p 2 (a), y (a) (which are 
prefixes of u v ) that u v satisfies the property. □ 

Proposition 12 Every S-automatic sequence is generated by a substitution. 

Proof. Let 5 = (£,£,<), Aft = (if, s,E,<5,F) be a DFA accepting L and u 
be an S'-automatic sequence obtained with the DFAO Ai = (K' , s', S, <5', A, r). 
From these two automata, we construct the product automaton M = (if x 
if', (s, s'), S, j/) where ^((fc, A:'), cr) = ((5(fc, a),S'(k', a)). We do not give explic- 
itly the final states of M. By Lemma O, we associate to this automaton a 
morphism <^a/ : (if x if') U {a} — * ((if x if') U {«})*■ To conclude the proof, 
we construct the erasing morphism h : (if x if') U {a} — * A* defined by 

h(a) = e 
h{{k,k')) =e iik^F 

— r(fc') otherwise. 

Indeed, <p"(a) is the sequence of the states reached by the words of E* in M 
but we are only interested in the words belonging to L and in the corresponding 
output of M. . Thus u is generated by (tfM ,h,a). □ 

Dealing with erasing morphisms whenever one wants to determine the com- 
plexity function of a sequence is painful. So the next lemma permits to get rid 
of erasing morphisms. 

Lemma 13 [pi If f and g are arbitrary morphisms with f(g u '(a)) an infinite 
word, then there exists a non-erasing morphism k and a coding h (i.e., a letter- 
to-letter morphism h) such that f(g u {a)) = /i(fc"(a)). □ 

Theorem 14 The complexity of an automatic sequence is in 0(n 2 ). Moreover, 
there exists an automatic sequence v and a positive constant d' such thatVn > 0, 
Pv(n) > d'n 2 . 

Proof. Let u be an 5-automatic sequence. By Proposition |l2|, u is generated 
by a substitution (ip, h, a) and by Lemma |l3| we can suppose that h is non- 
erasing. The word u v — (p^{a) is generated by an iterated morphism and thus 
Pu {n) < dn 2 . To conclude, since u = h(u v ), recall that if v, w are two infinite 
words and if h is a non-erasing morphism such that h(v) = w then there exist 
positive constants a, b such that p w (n) < ap v (n + b) |13| . 

2) We show that there exist a language L over an ordered alphabet and 
a DFAO such that the corresponding automatic sequence v has a complexity 
function p v (n) > d'n 2 . 




The morphism 

<P 

generates the word w — f u (0). Since 2 is a bounded letter (i.e., |<£ n (2)| is 
bounded) and 2™ is a factor of w for an arbitrary n, there exists a positive 
constant d! such that p w (n) > d! n 2 (see |13J ) . Using the same technique as in 
Remark [|, we construct an ^-automatic sequence v such that p v (n) > p w (n) . 
One find easily that the regular language used in the numeration system S is 
L = a* U a*ba* U a*ba*ba*. □ 

To conclude this section, we refine in a very simple way Proposition H2 to 
give a characterization of the ^-automatic sequences. 

Let T = ((/?, h, c) and T" = ((p\ ft', c') be two substitutions such that tp : £ — > 
£*, ft : S -> A*, if' : £' ->■ £'* and ft' : S' -» A'*. A morphism of substitutions 
to : T — > T" is a surjective morphism m : S U A — > S' U A' such that 

1. to(c) = c', to(E) = E', to(A) = A' 

2. to(^(ct)) = ^'(m(cr)), Va 6S 

3. m(ft(cr)) = ft'(m(a)), Vct G E. 

For a regular language L on the totally ordered alphabet (£, <) and for 
a DFAO M — [K, s, E, 8, A, r), one can construct the canonical substitution 
Tr L<M \ by proceeding in the same way as in Proposition |LJ with Ml equals 
to the minimal automaton of L and the DFAO M. equals to a reduced and 
accessible copy of M. 

To reduce M, one have to merge the states p, q such that for all w G £*, 
t(S(p,w)) =T(S(q,w)). 

Definition 15 A substitution T is an (L, <, M)- substitution if there exists a 
morphism m : T — > Ti L<M y This kind of construction has already been 
introduced in for linear numeration systems based on a Pisot number. 

The next theorem is obvious and we state it without proof. 

Theorem 16 Let S = (L, E, <). The sequence u G A w is S -automatic if and 
only if u is generated by a (L, <, M)- substitution for some DFAO M. □ 

5 Application to ,S-recognizable sets of integers 

Proposition [12] gives a necessary condition for a set X of integers to be S- 
recognizable. The characteristic sequence xx G {0, 1}" has to be generated by 
a substitution. Thus this proposition can be used as an interesting tool to show 
that a subset of IN is not S-recognizable for any numeration system S. 

In the following V is the set of primes and \v is its characteristic sequence. 
We show that V is never S'-recognizable but first we construct by hand a sub- 
set of IN which cannot be S'-recognizable for its characteristic sequence is too 
complex. 



Example 1 For n > 3, consider the I I words belonging to {0, 1}" which 

contains exactly three '1' and concatenate these words lexicographically ordered 
to obtain the word w n —3- To conclude consider the infinite word 

w = w oWl w 2 ...= Ill 0111 1011 1101 111000111 01011 . . . . . . 

Wo W± W2 

By construction, it is obvious that for all positive constants C, there exists no 
such that Vn > no : p w (n) > Cn 2 . Thus w cannot be generated by a substitution 
and the corresponding subset W such that xw — w, 

W = {0, 1, 2, 4, 5, 6, 7, 9, 10, 11, 12, 14, 15, 16, 17, 21, 22, 23, 25, 27, 28, . . .}, 

is never S'-recognizable. 

Proposition 17 For any numeration system S, V is not S -recognizable. 

Proof. In |ll], [l2J, C. Mauduit shows using some density arguments that xv £ 
{0, 1}" is not generated by a substitution (tp, h, a) where h sends all the letters 
on except one. A slight adaptation of the proof leads to the conclusion for 
any letter-to-letter morphism h. □ 
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